Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions
نویسندگان
چکیده
This paper studies the asymptotic behavior of constant step Stochastic Gradient Descent for minimization an unknown function, defined as expectation a non convex, smooth, locally Lipschitz random function. As gradient may not exist, it is replaced by certain operator: reasonable choice to use element Clarke subdifferential function; another output celebrated backpropagation algorithm, which popular amongst practioners, and whose properties have recently been studied Bolte Pauwels. Since chosen operator in general mean has assumed literature that oracle function available. first result, shown this such needed almost all initialization points algorithm. Next, small size regime, interpolated trajectory algorithm converges probability (in compact convergence sense) towards set solutions particular differential inclusion: subgradient flow. Finally, viewing iterates Markov chain transition kernel indexed size, invariant distribution converge weakly inclusion tends zero. These results show when small, with large probability, eventually lie neighborhood critical
منابع مشابه
Convergence diagnostics for stochastic gradient descent with constant step size
Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...
متن کاملConvergence Rate of Sign Stochastic Gradient Descent for Non-convex Functions
The sign stochastic gradient descent method (signSGD) utilises only the sign of the stochastic gradient in its updates. For deep networks, this one-bit quantisation has surprisingly little impact on convergence speed or generalisation performance compared to SGD. Since signSGD is effectively compressing the gradients, it is very relevant for distributed optimisation where gradients need to be a...
متن کاملGlobal Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems
The Burer-Monteiro [1] decomposition (X = Y Y T ) with stochastic gradient descent is commonly employed to speed up and scale up matrix problems including matrix completion, subspace tracking, and SDP relaxation. Although it is widely used in practice, there exist no known global convergence results for this method. In this paper, we prove that, under broad sampling conditions, a first-order ra...
متن کاملStochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required nontrivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate t...
متن کاملStochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T ) for strongly convex functions, instead of O(κ ln(T )/T ). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T ).
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Set-valued and Variational Analysis
سال: 2022
ISSN: ['1877-0541', '1877-0533']
DOI: https://doi.org/10.1007/s11228-022-00638-z